NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ContactPFP: Protein Function Prediction Using Predicted Contact Information

https://doi.org/10.3389/fbinf.2022.896295

Kagaya, Yuki; Flannery, Sean T.; Jain, Aashish; Kihara, Daisuke (June 2022, Frontiers in Bioinformatics)

Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method.
more » « less
Full Text Available
Analyzing effect of quadruple multiple sequence alignments on deep learning based protein inter-residue distance prediction

https://doi.org/10.1038/s41598-021-87204-z

Jain, Aashish; Terashi, Genki; Kagaya, Yuki; Maddhuri Venkata Subramaniya, Sai Raghavendra; Christoffer, Charles; Kihara, Daisuke (December 2021, Scientific Reports)
null (Ed.)
Abstract Protein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling.
more » « less
Full Text Available
A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography

https://doi.org/10.1093/molbev/msae036

Bukhman, Yury V; Morin, Phillip A; Meyer, Susanne; Chu, Li-Fang; Jacobsen, Jeff K; Antosiewicz-Bourget, Jessica; Mamott, Daniel; Gonzales, Maylie; Argus, Cara; Bolin, Jennifer; et al (March 2024, Molecular Biology and Evolution)
Gaut, Brandon (Ed.)
Abstract The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.
more » « less
Full Text Available
Protein contact map refinement for improving structure prediction using generative adversarial networks

https://doi.org/10.1093/bioinformatics/btab220

Maddhuri Venkata Subramaniya, Sai Raghavendra; Terashi, Genki; Jain, Aashish; Kagaya, Yuki; Kihara, Daisuke (March 2021, Bioinformatics)
Valencia, Alfonso (Ed.)
Abstract Motivation Protein structure prediction remains as one of the most important problems in computational biology and biophysics. In the past few years, protein residue–residue contact prediction has undergone substantial improvement, which has made it a critical driving force for successful protein structure prediction. Boosting the accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. Results We show a novel contact map refinement method, ContactGAN, which uses Generative Adversarial Networks (GAN). ContactGAN was able to make a significant improvement over predictions made by recent contact prediction methods when tested on three datasets including protein structure modeling targets in CASP13 and CASP14. We show improvement of precision in contact prediction, which translated into improvement in the accuracy of protein tertiary structure models. On the other hand, observed improvement over trRosetta was relatively small, reasons for which are discussed. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy. Availability and implementation https://github.com/kiharalab/ContactGAN. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes

https://doi.org/10.1186/s12915-022-01427-8

Toh, Huishi; Yang, Chentao; Formenti, Giulio; Raja, Kalpana; Yan, Lily; Tracey, Alan; Chow, William; Howe, Kerstin; Bergeron, Lucie A.; Zhang, Guojie; et al (November 2022, BMC Biology)

Abstract BackgroundThe Nile rat (Avicanthis niloticus) is an important animal model because of its robust diurnal rhythm, a cone-rich retina, and a propensity to develop diet-induced diabetes without chemical or genetic modifications. A closer similarity to humans in these aspects, compared to the widely usedMus musculusandRattus norvegicusmodels, holds the promise of better translation of research findings to the clinic. ResultsWe report a 2.5 Gb, chromosome-level reference genome assembly with fully resolved parental haplotypes, generated with the Vertebrate Genomes Project (VGP). The assembly is highly contiguous, with contig N50 of 11.1 Mb, scaffold N50 of 83 Mb, and 95.2% of the sequence assigned to chromosomes. We used a novel workflow to identify 3613 segmental duplications and quantify duplicated genes. Comparative analyses revealed unique genomic features of the Nile rat, including some that affect genes associated with type 2 diabetes and metabolic dysfunctions. We discuss 14 genes that are heterozygous in the Nile rat or highly diverged from the house mouse. ConclusionsOur findings reflect the exceptional level of genomic resolution present in this assembly, which will greatly expand the potential of the Nile rat as a model organism.
more » « less
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

https://doi.org/10.1186/s13059-019-1835-8

Zhou, Naihui; Jiang, Yuxiang; Bergquist, Timothy R.; Lee, Alexandra J.; Kacsoh, Balint Z.; Crocker, Alex W.; Lewis, Kimberley A.; Georghiou, George; Nguyen, Huy N.; Hamid, Md Nafiz; et al (December 2019, Genome Biology)

Full Text Available

Search for: All records